Recovering Latent Information in Treebanks

نویسندگان

David Chiang

Daniel M. Bikel

چکیده

Many recent statistical parsers rely on a preprocessing step which uses hand-written, corpus-specific rules to augment the training data with extra information. For example, head-finding rules are used to augment node labels with lexical heads. In this paper, we provide machinery to reduce the amount of human effort needed to adapt existing models to new corpora: first, we propose a flexible notation for specifying these rules that would allow them to be shared by different models; second, we report on an experiment to see whether we can use ExpectationMaximization to automatically fine-tune a set of hand-written rules to a particular corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parsing German with Latent Variable Grammars

We describe experiments on learning latent variable grammars for various German treebanks, using a language-agnostic statistical approach. In our method, a minimal initial grammar is hierarchically refined using an adaptive split-and-merge EM procedure, giving compact, accurate grammars. The learning procedure directly maximizes the likelihood of the training treebank, without the use of any la...

متن کامل

Using multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals

BACKGROUND AND OBJECTIVES: The classification of marine animals as protected species makes data and information on them to be very important. Therefore, this led to the need to retrieve and understand the data on the event counts for stranded marine animals based on location emergence, number of individuals, behavior, and threats to their presence. Whales are g...

متن کامل

The Xavier Module – Information Processing of Treebanks

This paper aims to introduce the Xavier module, a program package to process Treebanks (in particular, the Sejong Korean Treebank). In this paper, the procedure of implementing Xavier is discussed, and main usage of the program is also provided. Though this paper focuses on the Sejong Korean Treebank, Xavier is also applicable to other Treebanks, such as the Penn Treebanks, because it has been ...

متن کامل

Latent Semantic Clustering of German Verbs with Treebank Data

Treebank data have been utilized as data sources for a wide range of tasks in computational linguistics, including statistical parsing, anaphora resolution, induction of valence lexica, etc. More recently, researchers have experimented with extracting semantic information from syntactically annotated data. Here, treebank data have been used for the purposes of identifying selectional preference...

متن کامل

Reconstructing Requirements Traceability in Design and Test Using Latent Semantic Indexing

Managing traceability data is an important aspect of the software development process. In this paper we define a methodology, consisting of six steps, for reconstructing requirements views using traceability data. One of the steps concerns the reconstruction of the traceability data. We investigate to what extent Latent Semantic Indexing (LSI), an information retrieval technique, can help recov...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Recovering Latent Information in Treebanks

نویسندگان

چکیده

منابع مشابه

Parsing German with Latent Variable Grammars

Using multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals

The Xavier Module – Information Processing of Treebanks

Latent Semantic Clustering of German Verbs with Treebank Data

Reconstructing Requirements Traceability in Design and Test Using Latent Semantic Indexing

عنوان ژورنال:

اشتراک گذاری